Towards Learning Generalizable Code Embeddings Using Task-agnostic Graph Convolutional Networks
نویسندگان
چکیده
Code embeddings have seen increasing applications in software engineering (SE) research and practice recently. Despite the advances embedding techniques applied SE research, one of main challenges is their generalizability. A recent study finds that code may not be readily leveraged for downstream tasks are particularly trained for. Therefore, this article, we propose GraphCodeVec , which represents source as graphs leverages Graph Convolutional Networks to learn more generalizable a task-agnostic manner. The edges graph representation automatically constructed from paths abstract syntax trees, nodes tokens code. To evaluate effectiveness consider three benchmark (i.e., comment generation, authorship identification, clones detection) used prior benchmarking add new classification, logging statements prediction, defect prediction), resulting total six considered our evaluation. For each task, apply learned by four baseline approaches compare respective performance. We find outperforms all baselines five out tasks, its performance relatively stable across different datasets. In addition, perform ablation experiments understand impacts training context extracted trees) model Networks) on generated embeddings. results show both can benefit producing high-quality while improvement robust Our findings suggest future using graph-based deep learning methods capture structural information tasks.
منابع مشابه
Towards Generalizable Sentence Embeddings
In this work, we evaluate different sentence encoders with emphasis on examining their embedding spaces. Specifically, we hypothesize that a “high-quality” embedding aids in generalization, promoting transfer learning as well as zero-shot and one-shot learning. To investigate this, we modify Skipthought vectors to learn a more generalizable space by exploiting a small amount of supervision. The...
متن کاملConvolutional 2D Knowledge Graph Embeddings
Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow, fast models which can scale to large knowledge graphs. However, these models learn less expressive features than deep, multi-layer models – which potentially limits performance. In this work we introduce ConvE, a multi-layer convolutiona...
متن کاملTowards Language-Agnostic Mobile Code
The Java Virtual Machine is primarily designed for transporting Java programs. As a consequence, when JVM bytecodes are used to transport programs in other languages, the result becomes less acceptable the more the source language diverges from Java. Microsoft’s .NET transport format fares better in this respect because it has a more flexible type system and instruction set, but it is not exten...
متن کاملTowards Incremental Learning with Deep Convolutional Networks
Deep neural networks are a powerful class of machine learning models. However they require a lot of time and computational resources to train. We propose to apply an incremental learning approach to train models by utilizing the information present in pre-trained models. We build towards this goal by studying the relationship between network architecture, categories in training data, the amount...
متن کاملProtein Interface Prediction using Graph Convolutional Networks
We consider the prediction of interfaces between proteins, a challenging problem with important applications in drug discovery and design, and examine the performance of existing and newly proposed spatial graph convolution operators for this task. By performing convolution over a local neighborhood of a node of interest, we are able to stack multiple layers of convolution and learn effective l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Software Engineering and Methodology
سال: 2023
ISSN: ['1049-331X', '1557-7392']
DOI: https://doi.org/10.1145/3542944